Fast range query estimation by N-level tree histograms

نویسندگان

  • Francesco Buccafurri
  • Gianluca Lax
چکیده

Histograms are a lossy compression technique widely applied in various application contexts, like query optimization, statistical and temporal databases, OLAP applications, data streams, and so on. In most cases, accuracy in reconstructing from the histogram some original information, plays a crucial role. Thus, several proposals for constructing histograms trying to maximize their accuracy, have been given in the recent past. Besides bucket-based histograms (i.e., histograms whose construction is driven by the search of a “good” domain partition), there are different new histograms, characterized by more complex structures (like, for instance, wavelet-based histograms). This paper presents a new histogram, called nLT, belonging to the latter class. It is based on a hierarchical decomposition of the original data distribution kept in a full binary tree. This tree, containing a set of pre-computed hierarchical queries, uses bit saving for representing integer numbers, so that the reduced storage space allows us to increase the tree resolution and, consequently, its accuracy. Experimental comparison shows the superiority of nLT w.r.t. the state-of-the-art histograms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Substring Count Estimation in Extremely Long Strings

To estimate the number of substring matches against string data, count suffix trees (CS-tree) have been used as a kind of alphanumeric histograms. Although the trees are useful for substring count estimation in short data strings (e.g. name or title), they reveal several drawbacks when the target is changed to extremely long strings. First, it becomes too hard or at least slow to build CS-trees...

متن کامل

A Cardinality Estimation Approach Based on Two Level Histograms

For the mainstream relational database management systems, histograms play important roles in cardinality estimation. The main histogram-based cardinality estimation approaches can be classified into two categories: proactive approaches and reactive approaches. For the former, histograms are constructed and updated by periodical data scan which is also the essential reason affecting the accurac...

متن کامل

Estimating the Selectivity of XML Path Expression with Predicates by Histograms

Selectivity estimation of path expressions in querying XML data plays an important role in query optimization. A path expression may contain multiple branches with predicates, each of which having its impact on the selectivity of the entire query. In this paper, we propose a novel method based on 2-dimensional value histograms to estimate the selectivity of path expressions embedded with predic...

متن کامل

Query-Condition-Aware Histograms in Selectivity Estimation Method

The paper shows an adaptive approach to the query selectivity estimation problem for queries with a range selection condition based on continuous attributes. The selectivity factor estimates a size of data satisfying a query condition. This estimation is calculated at the initial stage of the query processing for choosing the optimal query execution plan. A non-parametric estimator of probabili...

متن کامل

A Histogram Utilizing the Cluster Information

Histograms are summary structures of large datasets, which are mainly used for selectivity estimation during query optimization. Selectivity estimation is the fast approximation of query result size. In this paper, we focus on multi-dimensional histograms, especially bidimensional histograms. At the time of selectivity estimation, buckets partially overlapping with a query return approximated r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Data Knowl. Eng.

دوره 51  شماره 

صفحات  -

تاریخ انتشار 2004